--- title: mahoudata keywords: fastai sidebar: home_sidebar summary: "API details." ---
The main idea of the data processing is to compute similarities among beers. In order to do so we are gonna compute similarities with regard to:
TODO:
df = pd.read_csv("./data/dataset-datathon.csv")
profile = ProfileReport(df, title='Pandas Profiling Report', html={'style':{'full_width':True}})
profile.to_notebook_iframe()
According to profile there are 60% duplicates. Get rid of them
df_clean = df.drop_duplicates(
#subset = df.columns.difference(['vajilla'])
)
profile = ProfileReport(df_clean, title='Pandas Profiling Report', html={'style':{'full_width':True}})
profile.to_notebook_iframe()
context = {'numeric_cols' : ['lupulo_afrutado_citrico',
'lupulo_floral_herbal','amargor', 'color',
'maltoso', 'licoroso', 'afrutado', 'especias','acidez']
}
f = RecomenderStrategyFactory(context)
strategy = f.createStrategy('numeric')
datamodel = strategy.model_builder(df_clean)
recommender_df = strategy.exec_strategy(datamodel)
recommender_df
recommendations_example = pd.DataFrame(recommender_df[1].sort_values(ascending=True))
recommendations_example
Below you can find work in progress
#Reshape to long form
long_form_cosine = recommender_df.unstack()
#rename columns and turn into a dataframe
long_form_cosine.index.rename(['Beer A', 'Beer B'], inplace=True)
long_form_cosine = long_form_cosine.to_frame('cosine distance').reset_index()
long_form_cosine
#df['tokenized_desc'] = df['desc'].apply(word_tokenize)